On Regular Expression Matching and Deterministic Finite Automata

نویسنده

  • Philip Bille
چکیده

Given a regular expression R and a string T the regular expression matching problem is to determine if T matches any string in the language generated by R. The best known solution to the problem uses linear space and O ( nm log logn log3/2 n +n+m ) time in the worst-case [2], where m and n are the lengths of R and T , respectively. A common misconception is that we can solve the problem efficiently by building a deterministic finite automaton (DFA) for R using 2O(m) space and then run it on T in O(n) time [1]. However, this analysis completely ignores issues of addressing into exponential sized data structures. An address in a DFA of size 2Ω(m) requires Ω(m) bits. Hence, on a standard unit-cost word RAM with word length Θ(logn) [3], we need at least Ω(m/ logn) time to simply write an address in the DFA. It follows that traversing the DFA for R uses at least Ω(nm/ logn+n+m) worst-case time (note that we do not even include DFA construction time). This bound can only be O(n) when m = O(logn) and is never better than the above best known bound. BODY Even ignoring construction time, deterministic finite automata do not solve regular expression matching in worst-case linear time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Treating Insomnia, Amnesia, and Acalculia in Regular Expression Matching

Regular expressions provide a flexible means for matching strings and they are often used in dataintensive applications. They are formally equivalent to either deterministic finite automata (DFAs) or nondeterministic finite automata (NFAs). Both DFAs and NFAs are affected by two problems known as amnesia and acalculia, and DFAs are also affected by a problem known as insomnia. Existing techniqu...

متن کامل

Improving NFA-Based Signature Matching Using Ordered Binary Decision Diagrams

Network intrusion detection systems (NIDS) make extensive use of regular expressions as attack signatures. Internally, NIDS represent and operate these signatures using finite automata. Existing representations of finite automata present a well-known time-space tradeoff: Deterministic automata (DFAs) provide fast matching but are memory intensive, while non-deterministic automata (NFAs) are spa...

متن کامل

Computer Science at Kent Regular expression matching using associative memory

This paper describes a method for the implementation of regular expression matching based on the use of a form of associative (or content addressable) memory. The regular expression matching is performed by converting the regular expression into a Deterministic Finite Automata, but then using associative memory to hold the state transition information. Rather than try t...

متن کامل

On the Semantics of Atomic Subgroups in Practical Regular Expressions

Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of the engine. The most common use is to prevent ne...

متن کامل

OFA: A Scalable Finite Automata-based Pattern- Matching Engine for Out-of-Order Deep Packet Inspection

To match the signatures of malicious traffic across packet boundaries, network-intrusion detection (and prevention) systems (NIDS) typically perform pattern matching after flow reassembly or packet reordering. However, this may lead to the need for large packet buffers, making detection vulnerable to denial-of-service (DoS) attacks, whereby attackers exhaust the buffer capacity by sending long ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • TinyToCS

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2015